A Method of Object-based De-duplication

نویسندگان

  • Fang Yan
  • YuAn Tan
چکیده

Today, the world is increasingly awash in more and more unstructured data, not only because of the Internet, but also because data that used to be collected on paper or media such as film, DVDs and compact discs has moved online [1]. Most of this data is unstructured and in diverse formats such as e-mail, documents, graphics, images, and videos. In managing unstructured data complexity and scalability, object storage has a clear advantage. Object-based data de-duplication is the current most advanced method and is the effective solution for detecting duplicate data. It can detect common embedded data for the first backup across completely unrelated files and even when physical block layout changes. However, almost all of the current researches on data de-duplication do not consider the content of different file types, and they do not have any knowledge of the backup data format. It has been proven that such method cannot achieve optimal performance for compound files. In our proposed system, we will first extract objects from files, Object_IDs are then obtained by applying hash function to the objects. The resulted Object_IDs are used to build as indexing keys in B+ tree like index structure, thus, we avoid the need for a full object index, the searching time for the duplicate objects reduces to O(log n).We introduce a new concept of a duplicate object resolver. The object resolver mediates access to all the objects and is a central point for managing all the metadata and indexes for all the objects. All objects are addressable by their IDs which is unique in the universe. The resolver stores metadata with triple format. This improved metadata management strategy allows us to set, add and resolve object properties with high flexibility, and allows the repeated use of the same metadata among duplicate object.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhanced Flush+Reload Attack on AES

In cloud computing, multiple users can share the same physical machine that can potentially leak secret information, in particular when the memory de-duplication is enabled. Flush+Reload attack is a cache-based attack that makes use of resource sharing. T-table implementation of AES is commonly used in the crypto libraries like OpenSSL. Several Flush+Reload attacks on T-table implementat...

متن کامل

Double Cervix with Normal Uterus and Vagina - An Unclassified Müllerian Anomaly

Müllerian anomalies are very common, and a frequent cause of infertility. The most used classification system until now, proposed by the American Society for Reproductive Medicine in 1988, categorizes comprehensively uterine anomalies but fails to classify defects of the cervix or vagina. This is based on a developmental theory that postulates that müllerian duct fusion is unidirectional, begin...

متن کامل

A Rare Case of Duplication of Chromosome 2 (q31.3q36.3) in a 4.5-year-old Boy and Review of the Literature

De novo duplication of 2q is very rare. Most cases of 2q duplications result from familial translocations, and are associated with simultaneous monosomy of another chromosome segment. To our knowledge and search in English literature there are less than 20 reported cases of isolated 2q duplication. Hereby we introduce a 4.5-year-old Iranian boy of a non-consanguineous marriage who was referred ...

متن کامل

Object-Oriented Method for Automatic Extraction of Road from High Resolution Satellite Images

As the information carried in a high spatial resolution image is not represented by single pixels but by meaningful image objects, which include the association of multiple pixels and their mutual relations, the object based method has become one of the most commonly used strategies for the processing of high resolution imagery. This processing comprises two fundamental and critical steps towar...

متن کامل

Analysis Accruing of Sentinel 2A Image’s Classification Methods Based on Object Base and Pixel Base in Flood Area Zoning of Taleqan River

Flood zonation mapping is one of the priorities for the soil and water management, which Remote Sensing (RS) capabilities are very applicable to this issue. The main objective of this research was study of accuracy of the Object oriented and Pixel based methods for flood zonation mapping in the Taleghan River basin. Therefore, the Sentinel 2A satellite image of the study area classified using s...

متن کامل

Using a Novel Concept of Potential Pixel Energy for Object Tracking

Abstract   In this paper, we propose a new method for kernel based object tracking which tracks the complete non rigid object. Definition the union image blob and mapping it to a new representation which we named as potential pixels matrix are the main part of tracking algorithm. The union image blob is constructed by expanding the previous object region based on the histogram feature. The pote...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JNW

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2011